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Abstract 

A two-terminal interactive distributed source coding problem with alternating messages for function computation 
at both locations is studied. For any number of messages, a computable characterization of the rate region is provided 
in terms of single-letter information measures. While interaction is useless in terms of the minimum sum-rate for 
lossless source reproduction at one or both locations, the gains can be arbitrarily large for function computation even 
when the sources are independent. For a class of sources and functions, interaction is shown to be useless, even with 
infinite messages, when a function has to be computed at only one location, but is shown to be useful, if functions have 
to be computed at both locations. For computing the Boolean AND function of two independent Bernoulli sources at 
both locations, an achievable infinite-message sum-rate with infinitesimal-rate messages is derived in terms of a two- 
dimensional definite integral and a rate-allocation curve. A general framework for multiterminal interactive function 
computation based on an information exchange protocol which successively switches among different distributed 
source coding configurations is developed. For networks with a star topology, multiple rounds of interactive coding 
is shown to decrease the scaling law of the total network rate by an order of magnitude as the network grows. 

Index Terms 

distributed source coding, function computation, interactive coding, rate-distortion region, Slepian-Wolf coding, 
two-way coding, Wyner-Ziv coding. 

I. Introduction 

In networked systems where distributed inferencing and control needs to be performed, the raw-data (source 
samples) generated at different nodes (information sources) needs to be transformed and combined in a number of 
ways to extract actionable information. This requires performing distributed computations on the source samples. 
A pure data-transfer solution approach would advocate first reliably reproducing the source samples at decision- 
making nodes and then performing suitable computations to extract actionable information. Two-way interaction 
and statistical dependencies among source, destination, and relay nodes, would be utilized, if at all, to primarily 
improve the reliability of data-reproduction than overall computation-efficiency. 

However, to maximize the overall computation-efficiency, it is necessary for nodes to interact bidirectionally, 
perform computations, and exploit statistical dependencies in data as opposed to only generating, receiving, and 

'This material is based upon work supported by the US National Science Foundation (NSF) under award (CAREER) CCF-0546598. Any 
opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the 
views of the NSF. A part of this work was presented in ISIT'08. 
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forwarding data. In this paper we attempt to formalize this common wisdom through some examples of distributed 
function-computation problems with the goal of minimizing the total number of bits exchanged per source sample. 
Our objective is to highlight the role of interaction in computation-efficiency within a distributed source coding 
framework involving block-coding asymptotics and vanishing probability of function-computation error. We derive 
information-theoretic characterizations of the set of feasible coding-rates for these problems and explore the 
fascinating interplay of function-structure, distribution-structure, and interaction. 

A. Problem setting 

Consider the following general two-terminal interactive distributed source coding problem with alternating mes- 
sages illustrated in Figure [1] Here, n samples X := X" :- {X{\), . . . ,X{n)) € X", of an information source are 
available at location A. A different location B has n samples Y e if" of a second information source which are 
statistically correlated to X. Location A desires to produce a sequence e such that c/^''(X, Y, Z^) < Da 

in) 

where c/^ is a nonnegative distortion function of 3« variables. Similarly, location B desires to produce a sequence 
Zfi 6 such that t/^'(X, Y, Z^) < Db- All alphabets are assumed to be finite. To achieve the desired objective, t 
coded messages. Mi, . . . ,M,, of respective bit rates (bits per source sample), R\,. . . ,R,, are sent alternately from 
the two locations starting with location A of location B. The message sent from a location can depend on the source 
samples at that location and on all the previous messages (which are available to both locations). There is enough 
memory at both locations to store all the source samples and messages. An important goal is to characterize the 
set of all rate f-tuples R := {Ru ...,R,) for which both P(6/^''(X, Y,Za) > Da) and P(4"'(X, Y,Zb) > Db) Q as 
n oo. This set of rate-tuples is called the rate region. 




Fig. 1 . Intemctive distributed source coding with t alternating messages. 



B. Related work 

The available literature closely related to this problem can be roughly partitioned into three broad categories. 
The salient features of related problems in these three categories are summarized below using the notation of the 
problem setting described above. 

1) Communication complexity [1]: Here, X and Y are typically deterministic, t is not fixed in advance, and 
c/^"' and are the indicator functions of the sets {Za + fA(X, Y)} and {Z^ + fB(X, Y)) respectively. Thus, the 
goal is to compute the function fA(X, Y) at location A and the function fB(X, Y) at location B. Both deterministic 
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and randomized coding strategies have been studied. If coding is deterministic, the functions are required to be 
computed without error, i.e.. Da - Dg = 0. If coding is randomized, with the sources of randomness independent 
of each other and X and Y, then Za and Zb are random variables. In this case, computation could be required 
to be error-free and the termination time t random (the Las-Vegas framework) or the termination time t could 
be held fixed but large enough to keep the probability of computation error smaller than some desired value (the 
Monte-Carlo framework). 

The coding-efficiency for function computation is called communication complexity. When coding is deterministic, 
communication complexity is measured in terms of the minimum value, over all codes, of the total number of bits 
that need to be exchanged between the two locations, to compute the functions without error, irrespective of 
the values of the sources. When coding is randomized, both the worst-case and the expected value of the total 
number of bits, over all sources of randomization, have been considered. The focus of much of the literature has 
been on establishing order-of-magnitude upper and lower bounds for the communication complexity and not on 
characterizing the set of all source coding rate tuples in bits per source sample. In fact, the ranges of and tg 
considered in the communication complexity literature are often orders of magnitude smaller than their domains. 
This would correspond to a vanishing source coding rate. 

Recently, however, Giridhar and Kumar successfully applied the communication complexity framework to study 
how the rate of function computation can scale with the size of the network for deterministic sources [2], [3]. They 
considered a network where each node observes a (deterministic) sequence of source samples and a sink node 
where the sequence of function values needs to be computed. To study how the computation rate scales with the 
network size, they considered the class of connected random planar networks and the class of co-located networks 
and focused on the divisible and symmetric families of functions. 

2) Interactive source reproduction: Kaspi [4] considered a distributed block source coding [5, Section 14.9] 
formulation of this problem for discrete memoryless stationary sources taking values in finite alphabets. However, 
the focus was on source reproduction with distortion and not function computation. The source reproduction quality 
was measured in terms of two single-letter distortion functions of the form (i^"\x, y, z^) ;= (l/n)2;'=i dA(y{i),ZA(i)) 
and c/^''(x, y, Zfi) :- (l/«) Yj'i=i dB{x{i),ZB{i))- Coupled single-letter distortion functions of the form dA{x(i),y{i),ZA(i)) 
and dsixii), y{i), ZB{i)), and probability of block error for lossless reproduction, were not considered. For a fixed 
number of messages f, a single-letter characterization of the sum-rate pair (J]y odd^;'Zi; even^;) (i^ot the entire 
rate region) was derived. However, no examples were presented to illustrate the benefits of two-way source coding. 
The key question; "does two-way (interactive) distributed source coding with more messages require a strictly less 
sum-rate than with fewer messages?" was left unanswered. 

The recent paper by Yang and He [6] studied two-terminal interactive source coding for the lossless reproduction 
of a stationary non-ergodic source X at B with decoder side-information Y. Here, the code termination criterion 
depended on the sources and previous messages so that t was a random variable. Two-way interactive coding was 
shown to be strictly better than one-way non-interactive coding. 

3) Interactive function computation: In [7], Yamamoto studied the problem where (X, Y) is a doubly symmetric 
binary source^ terminal B is required to compute a Boolean function of the sources satisfying an expected per- 
sample Hamming distortion criterion corresponding to d^^\x,y,ZB) '■- (I /n) YIi=iifBix(i),y{i))®ZB{i)), where fB(x,y) 
is a Boolean function, only one message is allowed, i.e., f = 1, and nothing is required to be computed at terminal 

2(X(0, Y{i)) ~ iid pxY(x,y) = 0.5(1 - p)S„ + 0.5p(\ - 6_„), where Si, is the Kronecker delta, and x,y e |0, 1). We say {X, Y) ~ DSBS(p). 
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A, i.e., d^^ - 0. This is equivalent to Wyner-Ziv source coding [5] with decoder side-information for a per-sample 
distortion function which depends on the decoder reconstruction and both the sources. Yamamoto computed the 
rate-distortion function for all the 16 Boolean functions of two binary variables and showed that they are of only 
three forms. 

In [8], Han and Kobayashi studied a three-terminal problem where X and Y are discrete memoryless stationary 
sources taking values in finite alphabets, X is observed at terminal one and Y at terminal two and terminal three 
wishes to compute a samplewise function of the sources losslessly. Only terminals one and two can each send only 
one message to terminal three. Han and Kobayashi characterized the class of functions for which the rate region 
of this problem coincides with the Slepian-Wolf [5] rate region. 

Orlitsky and Roche [9] studied a distributed block source coding problem whose setup coincides with Kaspi's prob- 
lem [4] described above. However, the focus was on computing a samplewise function fgCX, Y) - {fsiXii), Y{i)))"^^ 
of the two sources at terminal B using up to two messages (f < 2). Nothing was required to be computed at terminal 
A, i.e., c/^'* - 0. Both probability of block error P({Zb + f^CX, Y)}) and per-sample expected Hamming distortion 
(1/n) Yli^i 1P(Zb(0 fB{X{i), Y{i))) were considered. A single-letter characterization of the rate region was derived. 
Example 8 in [9] showed that the sum-rate with 2 messages is strictly smaller than with one message. 

C. Contributions 

We study the two-terminal interactive function computation problem described in Section II-AI for discrete 
memoryless stationary sources taking values in finite alphabets. The goal is to compute samplewise functions at one 
or both locations and the two functions can be the same or different. We focus on a distributed block source coding 
formulation involving a probability of block error which is required to vanish as the blocklength tends to infinity. 
We derive a computable characterization of the the rate region and the minimum sum-rate for any finite number of 
messages in terms of single-letter information quantities (Theorem[T]and Corollary [U. We show how the rate-regions 
for different number of messages and different starting locations are nested (Proposition [U. We show how the Markov 
chain and conditional entropy constraints associated with the rate region are related to certain geometrical properties 
of the support-set of the joint distribution and the function-structure (Lemma [T]). This relationship provides a link 
to the concept of monochromatic rectangles which has been studied in the communication complexity literature. 
We also consider a concurrent kind of interaction where messages are exchanged simultaneously and show how the 
minimum sum-rate is bounded by the sum-rate for alternating-message interaction (Proposition |2]i. We also consider 
per-sample average distortion criteria based on coupled single-letter distortion functions which involve the decoder 
output and both sources. For expected distortion as well as probability of excess distortion we discuss how the 
single-letter characterization of the rate-distortion region is related to the rate region for probability of block error 
(Section HITBT i. 

Striking examples are presented to show how the benefit of interactive coding depends on the function-structure, 
computation at one/both locations, and the structure of the source distribution. Interactive coding is useless (in terms 
of the minimum sum-rate) if the goal is lossless source reproduction at one or both locations but the gains can be 
arbitrarily large for computing nontrivial functions involving both sources even when the sources are independent 
(Sections IIV-AI HV-BI and lIV-CT l. For certain classes of sources and functions, interactive coding is shown to have 
no advantage (Theorems |2] and O. In fact, for doubly symmetric binary sources, interactive coding, with even an 
unbounded number of messages is useless for computing any function at one location (Section lIV-Db but is useful 
if computation is desired at both locations (Section II V- El l. For independent Bernoulli sources, when the Boolean 
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AND function is required to be computed at both locations, we develop an achievable infinite-message sum-rate 
with an infinitesimal rate for each message (Section lIV-Fl i. This sum-rate is expressed in analytic closed-form, in 
terms of two two-dimensional definite integrals, which represent the total rate flowing in each direction, and a 
rate-allocation curve which coordinates the progression of function computation. 

We develop a general formulation of multiterminal interactive function computation in terms an interaction 
protocol which switches among many distributed source coding configurations (Section |V]i. We show how results 
for the two-terminal problem can be used to develop insights into optimum topologies for information flow in larger 
networks through a linear program involving cut-set lower bounds (Sections IV-BI and IV-Cb . We show that allowing 
any arbitrary number of interactive message exchanges over multiple rounds cannot reduce the minimum total rate 
for the Korner-Marton problem [10]. For networks with a star topology, however, we show that interaction can, in 
fact, decrease the scaling law of the total network rate by an order of magnitude as the network grows (Example 3 
in Section IV-Cb . 

Notation: In this paper, the terms terminal, node, and location, are synonymous and are used interchangeably. 
The acronym 'iid' stands for independent and identically distributed and 'pmf ' stands for probability mass function. 
Boldface letters such as, x, X, etc., are used to denote vectors. Although the dimension of a vector is suppressed 
in this notation, it will be clear from the context. With the exception of the symbols R,D,N,L,A, and B, random 
quantities are denoted in upper case, e.g., X, X, etc., and their specific instantiations are denoted in lower case, 
e.g., X - X, X = X, etc. When X denotes a random variable, X" denotes the ordered tuple (Xi, . . . ,X„) and X"^ 
denotes the ordered tuple (X„,, . . .,X„). However, for a set <S, S" denotes the «-fold Cartesian product S x . . . x S. 
The symbol X{i-) denotes {X{1), . . . , X(i - 1)) and X(i+) denotes (X(i -H 1), . . . , X{n)). The indicator function of set 
<S which is equal to one if x E >S and is zero otherwise, is denoted by ls(x). The support-set of a pmf p is the set 
over which it is strictly positive and is denoted by supp(/5). Symbols ffi. A, and V represent Boolean XOR, AND, 
and OR respectively. 

II. Two-terminal interactive function computation 
A. Interactive distributed source code 

We consider two statistically dependent discrete memoryless stationary sources taking values in finite alphabets. 
For / - !,...,«, let {X{i),Y(i)) ~ iid pxY(x,y),x e ?(,y e < oo,\}/\ < oo. Here, pxY is a joint pmf which 

describes the statistical dependencies among the samples observed at the two locations at each time instant /. Let 
/a : <Y X J/ ^ and fB-Xx^—> Z.B be functions of interest at locations A and B respectively, where Xa and 
Z.B are finite alphabets. The desired outputs at locations A and B are and respectively, where for / = 
Za(0 /a(X(/), F(0) and Zed) /b(X(0, ^O)- 

Definition 1: A (two-terminal) interactive distributed source code (for function computation) with initial location 
A and parameters (f, n, |A1i|, . . . , 1A1;|) is the tuple {ei, . . . ,e,, gA, gs) of f block encoding functions ei,...,e, and 
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two block decoding functions gA,gB, of blocklength n, where for j - 1, . . . , f, 

( ?(" X 0;:,' Mi ^ , if is odd 
' ' I J/" X M,- ^ My , if is even 

(DecA) gA-. X"x(^Mj^Zl 

I 

(Dec.B) gB-. J/"x(g)M,-^Z^. 

The output of ej, denoted by Mj, is called the y-th message, and t is the number of messages. The outputs of gA 
and gB are denoted by and respectively. For each j, (l/n)log2 \Mj\ is called the y-th block-coding rate (in 
bits per sample). 

Intuitively speaking, t coded messages, M\, . . . , Mt, are sent alternately from the two locations starting with 
location A. The message sent from a location can depend on the source samples at that location and on all the 
previous messages (which are available to both locations from previous message transfers). There is enough memory 
at both locations to store all the source samples and messages. 

We consider two types of fidelity criteria for interactive function computation in this paper. These are 1) probability 
of block error and 2) per-sample distortion. 

B. Probability of block error and operational rate region 

Of interest here are the probabihties of block error P(Za + Za) and P(Zb 4^ Zs) which are multi-letter distortion 
functions. The performance of f-message interactive coding for function computation is measured as follows. 

Definition 2: A rate tuple R = {Ri, . . . ,R,) is admissible for f-message interactive function computation with 
initial location A if, Ve > 0, 3 A^(e, t) such that Vn > N{e, t), there exists an interactive distributed source code with 
initial location A and parameters (f, n, \Ai\\, . . . , \M,\) satisfying 

-\og2\Mj\<Rj + £, l,...,f, 
n 

P(ZA^ZA)<e, P(ZB^ZB)<e. 
The set of all admissible rate tuples, denoted by Kf, is called the operational rate region for f-message interactive 
function computation with initial location A. The rate region is closed and convex due to the way it has been 
defined. The minimum sum-rate Rji,,„, is given by min(2j=i ^j) where the minimization is over ReTif. For initial 
location B, the rate region and the minimum sum-rate are denoted by Kf and , respectively. 

C. Per-sample distortion and operational rate -distortion region 

Let dA : P(x}fxZ.A — » R-^ and dB ■ Xx^ x!Zb be bounded single-letter distortion functions. The fidelity 

of function computation can be measured by the per-sample average distortion 

d'-fix,-y,ZA) := - V dA{x(i),y(i),ZA(i))- 
n ^ 

1=1 

1 " 

dfix,y,ZB) := - ydB{x(i),y(i),ZB(i)). 
n ^ 

1=1 

Of interest here are either the expected per-sample distortions ^[^^"'(X, Y, Za)] and E[d^g\X,Y,ZB)] or the prob- 
abilities of excess distortion P(4"'(X, Y,Za) > Da) and P(4"'(X, Y,Zb) > Db). Note that although the desired 
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functions /a and fs do not explicitly appear in these fidelity criteria, they are subsumed by cIa and dg because they 
accommodate general relationships between the sources and the outputs of the decoding functions. The performance 
of f-message interactive coding for function computation is measured as follows. 

Definition 3: A rate-distortion tuple (R, D) = {Ri, . . . ,Ri,Da,Db) is admissible for f-message interactive function 
computation with initial location A if, Ve > 0, 3 N(e, t) such that V« > N{e, t), there exists an interactive distributed 
source code with initial location A and parameters (f, n, |A1i|, . . . , |A1,|) satisfying 

^\og2\Mj\<Rj + e, y= 

£[4"\X,Y,Za)] < £»a + e, £[4"'(X, Y,Zb)] < + e. 

The set of all admissible rate-distortion tuples, denoted by liDf, is called the operational rate-distortion region for 
f-message interactive function computation with initial location A. The rate-distortion region is closed and convex 
due to the way it has been defined. The sum-rate-distortion function /?j,„„,(D) is given by min^2j=i^y) where 
the minimization is over all R such that (R, D) e HT)^. For initial location B, the rate-distortion region and the 
minimum sum-rate-distortion function are denoted by 'R2)f and ,(D) respectively. 

The admissibility of a rate-distortion tuple can also be defined in terms of the probability of excess distortion 
by replacing the expected distortion conditions in Definition [3] by the conditions P(fif^"^(X, Y, Z^i) > Da) < e and 
P((i^''(X, Y, Zg) > Db) < e. Although these conditions appear to be more stringenj^ it can be showr0that they lead 
to the same operational rate-distortion region. For simplicity, we focus on the expected distortion conditions as in 
Definition |3] 



D. Discussion 

For a f-message interactive distributed source code, if \Mt\ - 1, then M, = constant (null message) and nothing 
needs to be sent in the last step and the f-message code reduces to a (f- l)-message code. Thus the (f- l)-message 
rate region is contained within the f-message rate region. For generality and convenience, \Mj\ = 1 is allowed for 
all j < t. The following proposition summarizes some key properties of the rate regions which are needed in the 
sequel. 

Proposition 1: (i) li(Ri, . . .,R,^x) e then {Rx, . . .,R,-uO) e Kf. Hence > Rt,„,. (ii) If . . .,R,^i) e 
•R^,, then (0,Ru...,Rr-i) e -Rf. Hence > Similarly, /;f„,„(,_,, > R^,,,,. (iii) lim,^^Ri,„, = 

\ilH[—^ooRgifffi f — - Rsum,co- 

Proof: (i) Any (f- l)-message code with initial location A can be regarded as a special case of a f-message code 
with initial location A by taking |A1/-i| - 1. (ii) Any (f - l)-message code with initial location B can be regarded 
as a special case of a f-message code with initial location A by taking |A1i| = 1. (iii) From (i), , and /?f„„, , are 
nonincreasing in t and bounded from below from zero, so the limits exist. From (ii), R^^^^^ > Rsum t - ^ji,,,, 
hence the limits are equal. ■ 

Proposition[T]is also true for any fixed distortion levels {Da, Db) if we replace rate regions and minimum sum-rates 
in the proposition by rate-distortion regions and sum-rate-distortion functions respectively. 

'Any tuple which is admissible according to the probability of excess distortion criteria is also admissible according to the expected distortion 
criteria. 

^Using strong-typicality arguments in the proof of the achievability part of the single-letter characterization of the rate-distortion region. 
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E. Interaction with concurrent message exchanges 

In contrast to the type of interaction described in Section III-AI which involves alternating message transfers, 
one could also consider another type of interaction which involves concurrent messages exchanges. In this type of 
interaction, in the j-th round of interaction, two messages M^^ and M^^ are generated simultaneously by encoding 
functions e^.^ (at location A) and e^'* (at location B) respectively. These messages are based on the source samples 
which are available at each location and on all the previous messages {Mf^,Mf^y.J^ which are available to both 
locations from previous rounds of interaction. Then M^* and Mj'* are exchanged. In t rounds, 2f messages are 
transferred. After t rounds of interaction, decoding functions g^ and g^ generate function estimates based on all 
the messages and the source samples which are available at locations A and B respectively. We can define the 
rate region and the rate-distortion region for concurrent interaction as in Sections III-BI and III-CI for alternating 
interaction. Let R™'!l^, denote the minimum sum-rate for f-round interactive function computation with concurrent 
message exchanges. 

The following proposition shows how the minimum sum-rates for concurrent and alternating types of interaction 
bound each other. This is based on a purely structural comparison of alternating and concurrent modes of interaction. 
Proposition 2: (i) > RfZ, > (") \im,^^RfZ,, = limr^oo /Jf„„, - iJ™„,oc. 

Proof: (i) The first inequality holds because any f-message interactive code with alternating messages and initial 
location A can be regarded as a special case of a f-round interactive code with concurrent messages by taking 
\Mf\ = 1 for all even ; and \M'^.^\ = 1 for all odd 

The second inequality can be proved as follows. Given any f-round interactive code with concurrent messages and 
encoding functions {e^^, one can construct a (f 4- l)-message interactive code with alternating messages as 

follows: (1) Set ex :- e^^. (2) For y = 2, . . . , f, if j is even, define ej as the combination of e?^j and e^j^, otherwise, 
define ej as the combination of e^*j and e^.^ . (3) If f is even, set e,+i :- ef^, otherwise set e,+i := ef'^. It can be 
verified by induction that the inputs of {ei, . . . ,ei+i} defined in this way are indeed available when these encoding 
functions are used. Hence these are valid encoding functions for interactive coding with alternating messages. This 
(f-H l)-message interactive code with alternating messages has the same sum-rate as the original f-round interactive 
code with concurrent messages. Therefore we have ^^j„„(,_^i) < R™umf 

(ii) This follows from (i). ■ 

Although a f-round interactive code with concurrent messages uses 2f messages, the sum-rate performance is 
bounded by that of an alternating-message code with only (f + 1) messages. When f is large, the benefit of concur- 
rent interaction over alternating interaction disappears. Due to this reason and because for two-terminal function 
computation it is easier to describe results for alternating interaction, in Sections |III] and |IV] our discussion will be 
confined to alternating interaction. For multiterminal function computation, however, the framework of concurrent 
interaction becomes more convenient. Hence in Section[V]we consider multiterminal function computation problems 
with concurrent interaction. 

III. Rate region 

A. Probability of block error 

When the probability of block error is used to measure the quality of function computation, the rate region for 
f-message interactive distributed source coding with alternating messages can be characterized in terms of single- 
letter mutual information quantities involving auxiliary random variables satisfying conditional entropy constraints 
and Markov chain constraints. This characterization is provided by Theorem [T] 
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Theorem 1: 



9?^* = j R I 3 U',s.t. V/ = l,...,f, 

I{X;Ui\Y,U'-'^), Ui-{X,U'-^)-Y, /odd 
/(F; Ui\X, U'-^), Ui - (Y, U'-^) - X, i even 
mux, Y)\X, U') = 0, HifBiX, Y)\Y U') = }, (3.1) 



where U' are auxiliary random variables taking values in alphabets with the cardinalities bounded as follows, 

I wjn;:,;™)— ./+3, jcd. 

I WfriM lWl) + '-./ + 3. J even. 
It should be noted that the right side of ( 13. lb is convex and closed. This is because 'Rf is convex and closed and 

Theorem [T] shows that the right side of (13.1b is the same as "Rf. In fact the convexity and closedness of the right 

side of (13. lb can be shown directly without appealing to Theorem [T] and the properties of Hf. This is explained at 

the end of Appendix J] 

The proof of achievability follows from standard random coding and random binning arguments as in the source 
coding with side information problem studied by Wyner, Ziv, Gray, Ahlswede, and Korner [5] (also see Kaspi 
[4]). We only develop the intuition and informally sketch the steps leading to the proof of achievability. The key 
idea is to use a sequence of "Wyner-Ziv-like" codes. First, End quantizes X to Ui € {U\)" using a random 
codebook-1. The codewords are further randomly distributed into bins and the bin index of Ui is sent to location 
B. Enc.2 identifies Ui from the bin with the help of Y as decoder side-information. Next, Enc.2 jointly quantizes 
(Y,Ui) to U2 e {Ui)" using a random codebook-2. The codewords are randomly binned and the bin index of 
U2 is sent to location A. Enc.3 identifies U2 from the bin with the help of (X,Ui) as decoder side-information. 
Generally, for the j-ih message, j odd, Enc.y jointly quantizes (X, U-'"') to Uy e {Kj)" using a random codebook-j. 
The codewords are randomly binned and the bin index of Uy is sent to location B. Enc.(7 + 1) identifies Uy from 
the bin with the help of (Y,U-'"') as decoder side information. If j is even, interchange the roles of locations A 
and B and sources X and Y in the procedure for an odd j. Note that H(fA(X, Y)\X, U') - implies the existence 
of a deterministic function (pA such that (pAiX, U') - fA{X, Y). At the end of t messages, Dec.A produces Za by 
Zy\(0 - <pAiX{i), U'{i)), Vi = !,...,«. Similarly, Dec.B produces Z^. The rate and Markov chain constraints ensure 
that all quantized codewords are jointly strongly typical with the sources and are recovered with a probability which 
tends to one as n — > 00. The conditional entropy constraints ensure that the corresponding block error probabilities 
for function computation go to zero as the blocklength tends to infinity. 

The (weak) converse is proved in Appendix |I] following [4] using standard information inequalities, suitably 
defining auxiliary random variables, and using convexification (time-sharing) arguments. The conditional entropy 
constraints are established using Fano's inequality as in [8, Lemma 1]. The proof of cardinality bounds for the 
alphabets of the auxiliary random variables is also sketched. ■ 

Corollary 1: For all t, 

(/)<,„,, = mm[I{X;U'\Y) + I(Y;U'\X)], (3.3) 



U' 



('')Rtm, ^ H(fB(X,Y)\Y) + H(fA(X,Y)\X), (3.4) 

where in (i) U' are subject to all the Markov chain and conditional entropy constraints in ( 13.1b and the cardinality 
bounds given by ( 13.2b . 
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Proof: For (i), add all the rate inequalities in O.ll l enforcing all the constraints. Inequality (ii) can be proved 
either using ( I3.3l l and relaxing the Markov chains constraints, or using the following cut-set bound argument. If Y 
is also available at location A, then - /b(X, Y) can be computed at location A. Hence by the converse part of 
the Slepian-Wolf theorem [5], the sum-rate of all messages from A to B must be at least H{ fB{X,Y)\Y) for B to 
form Zfi. Similarly, the sum-rate of all messages from Z? to A must be at least Hif^iX, Y)\X). ■ 

Although ( 13.11 ) and ( 13. 3t provide computable single-letter characterizations of and R^i,„^ , respectively for all 
finite f, they do not provide a characterization for Rsmn.oa in terms of computable single-letter information quantities. 
This is because the cardinality bounds for the alphabets of the auxiliary random variable U' , given by ( 13.21 ). grow 
with t. 

The Markov chain and conditional entropy constraints of ( 13.11 ) imply certain structural properties which the 
support-set of the joint distribution of the source and auxiliary random variables need to satisfy. These properties 
are formalized below in Lemma [T] This lemma provides a bridge between certain concepts which have played 
a key role in the communication complexity literature [1] and distributed source coding theory. In order to state 
the lemma, we need to introduce some terminology used in the communication complexity literature [1]. This is 
adapted to our framework and notation. A subset c A'x J/ is called f -monochromatic if the function / is constant 
on J{. A subset J?l c x J/ is called a rectangle if J{. - Sx y- Sy for some Sx X and some Sy £ J/. Subsets 
of the form [x] x Sy, x e Sx, are called rows and subsets of the form Sx x {y}, y e Sy, are called columns the 
rectangle ^ - SxxSy. By definition, the empty set is simultaneously a rectangle, a row, and a column. If each row 
of a rectangle ^ is /-monochromatic, then J?[ is said to be row-wise /-monochromatic. Similarly, if each column 
of a rectangle ^ is /-monochromatic, then is said to be column-wise /-monochromatic. Clearly, if is both 
row-wise and column-wise /-monochromatic, then it is an /-monochromatic subset of X x }/. 

Lemma 1: Let U' be any set of auxiliary random variables satisfying the Markov chain and conditional entropy 
constraints of ( 13.1b . Let ^{u') := {{x,y)\pxyu'(x:,y,u') > 0} denote the projection of the M'-slice of suppipxyw) onto 
A" X J/. If suppipxy) - Xx^, then for all u', the following four conditions hold, (i) ^{u') is a rectangle, (ii) ^{u') 
is row-wise /(-monochromatic, (iii) ^{u') is column-wise /^-monochromatic, (iv) If in addition, /a - fs - f, then 
J?[(m') is /-monochromatic. 

Proof: (i) The Markov chains in ( 13. Il l induce the following factorization of the joint pmf. 

PXYU'{x,y,u') = PXY{x,y) ■ Pu,\x{Ul\x) ■ PU2\YU,{U2\y,Ul) ■ 
PU3\XU'-(U3\x,U^)... 

=: pxy(x,y)^x(x,u')4>y(y,u'), 

where (px is the product of all the factors having conditioning on x and (py is the product of all the factors having 
conditioning on y. Let Sx(u') := {x \ (f>x(x,u') > 0) and Sy{u') := {y \ (py{y,u') > 0}. Since pxy{x,y) > for all 
X and y, Jl{u') - Sx{u') x Sy{u'). (ii) This follows from the conditional entropy constraint H{fA{X, Y)\X, U') = 
in ( 13.1b . (iii) This follows from the conditional entropy constraint HifsiX, Y)\Y, U') = in (13.1b . (iv) This follows 
from parts (ii) and (iii) of this lemma. ■ 
Note that ^{u') is the empty set if, and only if, pu'{u') = 0. The above lemma holds for all values of f. The 
fact that the set Jl(u') has a rectangular shape is a consequence of the fact that the auxiliary random variables U' 
need to satisfy the Markov chain constraints in ( 13.1b . These Markov chain constraints are in turn consequences of 
the structural constraints which are inherent to the coding process - messages alternate from one terminal to the 
other and can depend on only the source samples and all the previously received messages which are available at 
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a terminal. The rectangular property depends "less directly" on the function-structure than on the structure of the 
coding process and the structure of the joint source distribution. On the other hand, the fact that Jl{u') is row-wise 
or/and column-wise monochromatic is a consequence of the fact that the auxiliary random variables U' need to 
satisfy the conditional entropy constraints in ( 13.1b . This property is more closely tied to the structure of the function 
and the structure of the joint distribution of sources. Lemma [1] will be used to prove Theorems |2] and Theorem |4] 
in the sequel. 

B. Rate-distortion region 

When per-sample distortion criteria are used, the single-letter characterization of the rate-distortion region is 
given by Theorem [1] with the conditional entropy constraints in ( 13. It replaced by the following expected dis- 
tortion constraints: there exist deterministic functions gA and gB, such that E[dA{X,Y,gA(X,U'))] < Da and 
E[dB{X,Y,gB{Y,U'))] < Db- The proof of achievability is similar to that of Theorem [T] The distortion constraints 
get satisfied "automatically" by using strongly typical sets in the random coding and binning arguments. The 
proof of the converse given in Appendix I] will continue to hold if equations ilA\ and ( II.5b are replaced by 
£'[ii^'*(X, Y, Za)] < Da + e and E[dg\X,Y ,Zb)] < Db + e respectively and the subsequent steps in the proof 
changed appropriately. 

The following proposition clarifies the relationship between the rate region for probability of block error and the 
rate-distortion region. 

Proposition 3: Let dn denote the Hamming distortion function. If dA{x,y,ZA) = dHifAix,y),ZA), dB(x,y,ZB) = 
dH{fB{x,y),ZB), and Da ^ Db ^ 0, then {R | (R,0,0) e my^} = nf. 

Proof: In order to show that {R | (R,0,0) e HD^] 3 1^, note that VR e ■^^f, we have e > P(Za +Za)> 
^[^/^"Vx, Y, Za)] and e > P(Zb + Zg) > ^[(^^''(X, Y, Z^)] for the distortion function assumed in the statement of 
the proposition. Therefore (R,0,0) € •y^. 

In order to show that {R ] (R, 0, 0) e ^Df } c , note that VR such that (R, 0, 0) 6 ??2)f , we have dA{X, Y, gA{X, U')) = 
dH{fA{X,Y),gA(X,U')) = 0, which impUes = gA{X,U'), which in turn impUes H{fA{X,Y)\X,U') = 0. 

Similarly, we have Y)\Y, U') = 0. Therefore R e •Rf . ■ 

Although the proof of the single-letter characterization of fiD^ implies the proof of Theorem [T| for Hf, since 
the focus of this paper is on probability of block error and the proofs for liD^ are very similar, we provide the 
detailed converse proof only for Theorem [T] for "Rf . 

IV. Examples 

Does interaction really help? In other words, does interactive coding with more messages strictly outperform 
coding with less messages in terms of the sum-rate? When only one nontrivial function has to be computed at only 
one location, at least one message is needed. In this situation, interaction will be considered to be "useful" if there 
exists f > 1 such that < R^u„^ y When nontrivial functions have to be computed at both locations, at least two 

messages are needed, one going from A to B and the other from B to A. Since messages go in both directions, 
a two-message code can be potentially considered to be interactive. However, this is a trivial form of interaction 
because function computation is impossible without two messages. Therefore, in this situation, interaction will be 
considered to be useful if there exists f > 2 such that /?f„„, , < R^^^^^ 2- Corollary [T] does not directly tell us if or when 
interaction is useful. In this section we explore the value of interaction in diff'erent scenarios through some striking 
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examples. Interaction does help in examples IIV-CI IIV-EI and IIV-FI and does not (even with infinite messages) in 
examples HV^ HV^ and HV^ 

A. Interaction is useless for reproducing one source at one location: fA{x,y) :— 0,fB{x,y) :— x. 

Only X needs to be reproduced at location B. Unless H{X\Y) - 0, at least one message is necessary. From (I3.4l l. 
Vf > 1, /?j„„,,, > H{X\Y). But 7?;^,,,,, ^ = H{X\Y) by Slepian-Wolf coding [5] with X as source and Y as decoder side 
information. Hence, by Proposition [Tli), ^l,,,,,, - R^sum i - ^(^1^) for all f > 1. 

B. Interaction is useless for reproducing both sources at both locations: fA{x,y) : — y, fB{x,y) :— x. 

Unless II{X\Y) = or H(Y\X) - 0, at least two messages are necessary. From ( |34] |, Vf > 2, 7?f„,„ , > II(X\Y) + 
H{Y\X). But = HiX\Y) + HiY\X) by Slepian-Wolf coding, first with X as source and Y as decoder side 

information and then vice-versa. Hence, by Proposition [Tti), ^j„„, , = ^j,„„2 ~ ^(^^1^) + ^(^1^) for all t >2. 

Examples IIV-AI and IIV-BI show that if the goal is source reproduction with vanishing distortion, interaction is 
uselesj^ To discover the value of interaction, we must study either nonzero distortions or functions which involve 
both sources. Our focus is on the latter 

C. Benefit of interaction can be arbitrarily large for function computation: X il Y, X ~ Uniform{\, . . . ,L}, pyi.^) = 
1 — Py(0) — p & (0, 1), fA{x,y) :— 0,fB{x,y) :— xy (real multiplication). 

This is an expanded version of Example 8 in [9]. At least one message is necessary. If f = 1, an achievable scheme 
is to send X by Slepian-Wolf coding at the rate II{X\Y) - logj L so that the function can be computed at location 
B. Although location B is required to compute only the samplewise product and is not required to reproduce X, it 
turns out, rather surprisingly, that the one-message rate II{X\Y) cannot be decreased. This is a direct consequence 
of a lemma due to Han and Kobayashi which we now state by adapting it to our situation and notation. 

Lemma 2: (Han and Kobayashi [8, Lemma 1]) Let supp(/?xy) = A' x J/. If Vjci,X2 e X, xi X2, there exists 
yo e J/ such that fBixuyo) + /B(x2,yo), then j > H{X\Y). 

The condition of Lemma |2] is satisfied in our present example with y^-X. Therefore we have 7?^^^^^ j = II{X\Y') - 
log2 L. With one extra message and initial location B, however, Y can be reproduced at location A by entropy-coding 
at the rate R\ - II{Y) = h2{p) bits per sample. Then, Zb can be computed at location A and conveyed to location B 
via Slepian-Wolf coding at the rate R2 = HifgiX, Y)\Y) = /^logjL bits per sample, where /i2 is the binary entropy 
function. Therefore, 7?*^^^^ ^ < h2ip) + p logj L. The benefit of even one extra message can be significant: For fixed 
L, (R'^ JR^ -) can be made arbitrarily large for suitably small p. For fixed n, (R^ , - R^ can be made 
arbitrarily large for suitably large L. 

Extrapolating from this example, one might be led to believe that the benefit of interaction arises due to computing 
nontrivial functions which involve both sources as opposed to reproducing the sources themselves. In other words, 
the function- structure determines whether interaction is beneficial or not (recall that the sources were independent 
in this example). However, the structure of the joint distribution plays an equally important role and this aspect 
will be highlighted in the next example. 

^However, interaction can prove useful for source reproduction when it is either required to be error-free [11], [12] or when the sources are 
stationary but non-ergodic [6] 
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D. Interaction can be useless for computing any function at one location: Y — X®W,XiLW,X~ Ber(q), 
W ~ Ber(p), fA(x,y) :- 0,fB{x,y) :- any function. 

If fB{x,y) does not depend on x, i.e., there exists a function /' such that fB{x,y) = f'iy), no communication is 
needed and interaction does not help. 

If fB{x,y) depends on x, then 3 yo e {0, 1) such that fBiO,yo) + fBil,yo)- Theorem|2]below, proved in Appendix Ull 
shows that interaction does not help even with infinite messages. 

Theorem 2: Let fA{x,y) - and \el Y - X®W, with X ilW, X ~ Ber{q), and W ~ Ber(p). If there exists a 
yo G {0, 1} such that /bCCjo) * fsi^^yo), then for all t e Z\ 7?t„„, = H(X\Y). 

Remark: The conclusion of Theorem |2] that interaction does not help cannot be directly deduced from ( 13. 4t : 
When fB{x,y) ^ x Ay (Boolean AND), the lower bound in Corollary Hii) H{X A Y\Y) = H{X\Y = l)py(l) is strictly 
less than H{X\Y) \fQ<p,q<\. 

The result of Theorem |2] can be generalized to the following theorem for non-binary sources. The proof of this 
theorem is provided in Appendix HI] immediately after the proof of Theorem |2] 

Theorem 3: Let fA{x,y) - and let supp(/:'xy) = A'x J/. If (i) the only column-wise /^-monochromatic rectangles 
of Xy.}f are subsets of rows and columns and (ii) there exists a random variable W and deterministic functions ij/ 
and 7/ such that Y = ij/{X, W), X = ri{Y, W), and H(Y\X) = H(W)% then for all t e Z+, /?f„„,, = H{X\Y). 

The examples till this point have highlighted the effects of function-structure and distribution-structure on the 
benefit of interaction. The next example will highlight a slightly different aspect of function-structure associated 
with the situation in which both sides need to compute the same nontrivial function which involves both sources. 
The distribution-structure in the next example will be essentially the same as in Example IIV-DI but with q - 1/2 
and Q < p < I, i.e., {X,Y) ~ DSBS(/:'). However, both locations will need to compute the samplewise Boolean 
AND function. Interestingly, in this situation the benefit of interaction returns as explained below. 

E. Interaction can be useful for computing a function of sources at both locations: {X,Y) ~ DSBS{p), p € (0,1), 
fA{x,y) = fBix,y) := X A y. 

Since both locations need to compute nontrivial functions, at least two messages are needed. In a 2-message code 
with initial location A, location B should be able to produce Zb after receiving the first message. By Lemma |2] 
Ri > H{X\Y) = h2{p). With Ri - /i2(p) and a Slepian-Wolf code with Y as side-information, X can be reproduced 
at location B. Thus for the second message, R2 = II(fB(X,Y)\X) = (l/2)h2{p) is both necessary and sufficient to 
ensure that location A can produce Za- Hence /?f„^2 ~ (3/2)/j2(/')- 

If a third message is allowed, one choice of auxiliary random variables in ( 13. Il l is Ui :- XvW, W ~ Ber{l/2), W Ji 
(X,Y), U2 := YAUu and t/3 :=XAt/2. Hence t/3 ^XaY ^ fB{X,Y) ^ H{fA{X,Y)\X,U^) ^ H{fB{X,Y)\YU^) ^ 0. 
Hence, 3 < /(X; U^\Y) + I{Y; U^\X) = f /i2(p) + ^12 (^) - ^ < |/J2(p) = Rtr,,2, where step (a) holds for 
all p e (0, 1) and the gap is maximum for p - 1/3. When p = 0.5, X IL Y, and an achievable 3-message sum-rate 
is a; 1.406 < 1.5 ^/J'^ 

Note that as a special case of Example IIV-DI if {X, Y) ~ DSBS(/5) and only location B needs to compute the 
Boolean AND function, interaction is useless. But if both locations need to compute it, and p e (0, 1), then the 
benefit of interaction returns. Motivated by the benefits of using the more and more messages, we investigate 
infinite-message interaction in the following example. 

«It is easy to see that if F = ifr(X, W), then H(Y\X) = H{W) o X iL W and H{W\X, Y) = 0. 
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F. An achievable infinite-message sum-rate as a definite integral with infinitesimal-rate messages: X iL Y, X ~ 
Ber{p), Y ~ Ber{q), p,qe (0, 1), fA(x,y) = fB(x,y) ^ x Ay. 




(a) (b) (c) 



Fig. 2. (a) 4-message intemctive code (b) co-message interactive code (c) oo-message interactive code with optimal rate-allocation curve when 
q>p. 

As in Example IIV-EI the 2-message minimum sum-rate is 2 ~ ^(^1^) + HifsiX, Y)\X) = h2{p) + ph2{q). 
Example IIV-EI demonstrates the gain of interaction. This inspires us to generalize the 3-message code of Exam- 
ple IIV-EI to an arbitrary number of messages and evaluate an achievable infinite-message sum-rate. Since we are 
interested in the limit f 00, it is sufficient to consider even-valued t due to Proposition [T] 

Define real auxiliary random variables (Vx, Vy) ~ Uniform([0, 1]~). If X :- l[i-;,,i](Vj:) and Y :- l[\-q,\]{Vy), then 
{X, Y) has the correct joint pmf, i.e., /?x(l) = 1 -pxCO) - p, prW = 1 -priO) - q and X \LY. We will interpret and 
1 as real zero and real one respectively as needed. This interpretation will allow us to express Boolean arithmetic 
in terms of real arithmetic. Thus X AY (Boolean AND) = XY (real multiplication). Define a rate-allocation curve 
r parametrically by Y :- {(a(s),j8(s)), <.?<!) where a and (3 are real, nondecreasing, absolutely continuous 
functions with a(0) — /3{0) = 0, a(l) - (1 - p), and /3{l) = (1 - q). The significance of F will become clear later 
Now choose a partition of [0, 1], = 5o < 5i < . . . < Si/2-1 < s,/2 - 1, such that max,=i_..._,/2(i; - i;-i) < A,. For 
i - 1, . . . , f/2, define t auxiliary random variables as follows, 

U2i-l '■- l[«(.s;),l]x[/3(i,-i),l](Vi-, V'v), U2i '■- ila{Sf),l]x\J3(si),l](Vx,Vy). 

In Figure I2a), (V,, V,) is uniformly distributed on the unit square and U' are defined to be 1 in rectangular 
regions which are nested. The following properties can be verified: 
PI: Ui>U2>...> U,. 

P2: H{X A Y\X, U') = H(X A Y\Y, U') = 0: since U, = l[i-p,i]x[i-?,i](Vt, Vy) =XaY. 

P3: U' satisfy all the Markov chain constraints in ( 13.1b : for example, consider C/21 - (Y, C/^'"') - X. 1/21-1 = => 
f/2, = and the Markov chain holds. C/2/-1 = F = 1 => (Vt, V,) e [a(si), 1] x [1 - q, 1] => f/2/ = 1 and the 
Markov chain holds. Given U2i-i = 1, F = 0, (V,, Vy) ~ Uniform([a'(5,), 1] x \J3{Si-i), 1 - ^]) ^ Vv and Vy are 
conditionally independent. Thus X ii U2i\{U2i-i = 1, F = 0) because X is a function of only and f/2, is a 
function of only Vy upon conditioning. So the Markov chain f/2, - {Y, f/-' ') -X holds in all situations. 

P4: (Y, f/2,) Ji X\U2i~i = 1: this can be proved by the same method as in P3. 
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P2 and P3 show that U' satisfy all the constraints in ( 13.1 
For i - 1, . . . , f/2, the (20-th rate is given by 



PI 

P4 
(b) 



I(Y;U2i\X,U2i-i = l)Pf/3,_,(l) 
I(Y;U2i\U2i-i = l)Pf/„_,(l) 

//(y|C/2,-l = l)p£/„_,(l)-//(F|f/2i,C/2,-l = 1)P£/„_,(1) 
//mt/2,-1 = l)P(/„_,(l)-//(}'|f/2i = l)pt/„(l) 
(l-t^(*,))((l-A^/-l))/^2l 



1 -y6(^,-i) 



-(1 -/?(.9,))/j2 



(c) 



(1 - a(si)) I log2 — — \dvy 



Wy(Vy, q)dV,4Vy, 



ff(i,).l]xb8(i,-i),Ai,)] 

where step (b) is due to property PA and because (C/2/-1, U2i) = (1,0) => Y - 0, hence H{Y\U2i, U2i-\ - l)pf/2,_i(l) = 
H{Y\U2i = 1, f/2/-i = l)pc/,,f/„_,(l, 1) = //(FIC/2/ = l)pt/„(l)' and step (c) is because 

^ _ ,,)/,2 (^)) = log2 [j^) w,,(v,„ 

The 2i-th rate can thus be expressed as a 2-D integral of a weight function Wy over the rectangular region 'Reg(2i) :- 
[a{si), 1] X \J3{si-i),/3{si)] (a horizontal bar in Figure |2| a)). Therefore, the sum of rates of all messages sent from 
location B to location A is the integral of Wy over the union of all the corresponding horizontal bars in Figure |2l a). 
Similarly, the sum of rates of all messages sent from location A to location B can be expressed as the integral of 
another weight function Wf(vv,/?) := log2((l - Vt)/(1 Vj)) over the union of all the vertical bars in Figure |2t a). 
Now let f — > 00 such that A, — > 0. Since a and /3 are absolutely continuous, ((^(s,) - a(s,_i)) and (J3(si) - 
— > 0. The union of the horizontal (resp. vertical bars) in Figure |2a) tends to the region "Wy (resp. "Wi) in 
Figure EJb). Hence an achievable infinite-message sum-rate given by 

(I w_^{vx,p)dvj,dvy+ II Wy{vy,q)dvj,dvy (4.5) 

JJiV, JJ'Wy 

depends on only the rate-allocation curve F which coordinates the progress of source descriptions at A and B. Since 
U is independent of F, i4~5\> is minimized when 'W^ - ^* :- {(v.v, v,.) e [0, 1 - p] x [0, l-q]: w.v(v.t, p) < 
Wy(vy,q)] U [0, 1 - p] X [1 - q, 1]. For q > p, the boundary F* separating "W* and "W* is given by the piecewise 
linear curve connecting (0, 0), {(q - p)/q,Q), (1 - p, 1 - ^) in that order (see Figure 2(c)). 
For "W^ - 'W*, i4.5i can be evaluated in closed form and is given by 

Inip) + phiiq) + P log2 q + p{l-q) logj e. (4.6) 

Recall that R'^^^|^^ - h2{p) + ph2{q)- The difference p{\og-^q + (1 - ^)logTe) is an increasing function of q for 
q e (0,1] and equals when q - \. Hence the difference is negative for q e (0,1). So Rsum,<x, < R^sum'^ 
interaction does help. In particular, when p - q - 1/2, {{X,Y) ~ iid Ber(l/2)), by an infinite-message code, we 
can achieve the sum-rate (1 -1- (log2e)/4) x 1.361, compared with the 3-message achievable sum-rate 1.406 and 
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the 2-message minimum sum-rate 1 .5 in Example IIV-EI It should be noted that for finite t, F is staircase-like 
and contains horizontal and vertical segments. However, T* contains an oblique segment. So the code with finite t 
generated in this way never achieves the infinite-message sum-rate. It can be approximated only when f ^ oo and 
each message uses an infinitesimal rate. 

Note that the achievable sum-rate i4.5\ is not shown to be the optimal sum-rate Rsum.oo because we only consider 
a particular construction of the auxiliary random variables. We have, however, the following lower bound for Rsum,<x, 
which can be proved by a technique which is similar to the proof of Theorem |2l 

Theorem 4: If X ii.Y,X ~ Ber(p), Y ~ Ber{q),fA(x,y) - fB(x,y) = x Ay, < p,q < 1, we have 



The proof is given in Appendix |llll This lower bound is strictly less than ( 14. 6b when < p,q < 1. For example, 
when p ^ 1/2, {{X, Y) ~ iid Ber(l/2)), the bound in Theorem g] gives us Rsu,„,oc > (2 - (3/4)/!2(I/3)) ^ 1.311, 
compared with the infinite-message achievable sum-rate 1.361. 



We can consider multiterminal interactive function computation problems as generalizations of the two-terminal 
interactive function computation problem. At a high level, interactive function computation may be thought of as 
a form of distributed source coding with progressive levels of feedback. Although the multiterminal problem is 
significantly more intricate, important insights can be extracted by leveraging results for the two-terminal problem. 
The ability to progressively refine information bi-directionally in multiple rounds lies at the heart of interactive 
function computation. This ability to refine information can have a significant impact on the efficiency of information 
transport in large networks as discussed in Section [V-CI (see Example 3). 

A. Problem formulation 

Let m be the number of nodes. Consider m statistically dependent discrete memoryless stationary sources taking 
values in finite alphabets. For each j, where j takes integer values 1 through m, let Xj := {Xj{l), . . .,Xj{n)) e (Xj)" 
denote the n source samples which are available at node j. For ; = let {Xi{i),X2{i), . . . ,X„,(0) ~ iid Px,....,x,„ 

where pxu...,x,„ i^ ^ joint pmf which describes the statistical dependencies among the samples observed at the m 
nodes at each time instant. For each j and let Zj(i) :- fj{X\{i), . . . ,Xm{i)) e IZj and let Zy :- (Zj{l), . . .,Zj(n)). 
The tuple Zj denotes n samples of the samplewise function of all the sources which is desired to be computed at 
node j. 

Let the topology of the network be characterized by a directed graph Q - ("V, £), where "V :- {I,. . .,m] is the 
vertex set of all the nodes and & is the edge set of all the directed finks which are available for communication. 
The network topology describes the connectivity and information flow constraints in the network. It is assumed 
that the topology is consistent with the goals of function computation, that is, for every node which computes a 
nontrivial function which depends on the source samples at other nodes, there exists a set of directed paths over 
which information can be transfered from the relevant nodes to perform the computation. In order to perform the 
computations, a f-round multiterminal interactive distributed source code for function computation can be defined by 
extending the notion of a f-round concurrent-message interactive code for the two-terminal problem (see Section llLEl i 
in the following manner In the i-th round, where ; takes integer values 1 through t, for each directed link {j, k) e &, 
a message Mj^i is generated at node j as a pre-specified deterministic function of and all the messages to and 
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from this node in all the previous rounds. Then all the messages in the i-lh round are transferred concurrently over 
all the available directed links. After t rounds, at each node j, a decoding function reproduces Zj as Zj based on 
Xj and all the messages to and from this node. As part of the f-round interactive code specification, a message 
over any link in any round is allowed to be a null message, i.e, no message is sent over the link, and this is known 
in advance as part of the code. By incorporating null messages, the concurrent-message interactive coding scheme 
described above subsumes all conceivable types of interaction. Let a link be called active in a given round if it 
does not carry a null message in that round. For each round /, let £, denote the subset of directed links in £ which 
are active. A f-round interaction protocol is the sequence of directed subgraphs £i , . . . , £, which describes how the 
nodes are permitted to exchange messages over diff'erent rounds. This controls the dynamics of information flow 
in the network. 

Our key point of view, illustrated in Figure [3] is that, interactive function computation is at its heart, an interaction 
protocol which successively switches the information-flow topology among several basic distributed source coding 
configurations. In the two-terminal case, the alternating-message interaction protocol is simple: messages alternate 
from one node to the other; the only free parameter in the protocol being the initial node which must be chosen 
to minimize the sum rate. For this protocol, there is essentially only one type of configuration and accordingly 
only one basic distributed source coding strategy, namely, Wyner-Ziv-like coding with all the previously received 
messages as common side-information available to both the nodes. The multiterminal case is, however, significantly 
more intricate. For instance, with three nodes there are several basic configurations in addition to the point-to-point 
one, e.g., many-to-one, one-to-many, and relay as shown in Figure [3] 




Xi 

i 



1 X3 , X; 



3-terminal interactive function computation 



Many-to-one configuration 



x. 



One-to-many configuration 



X2 

Relay configuration 



Fig. 3. Interactive function computation can be viewed as an interaction protocol whicii successively switches among several basic 
distributed source coding configurations. 

The efficiency of communication for function computation can be measured at various levels. The most precise 
characterization would be in terms of the (f|£|)-dimensional rate tuple {Rjki)(j,k)£S,i=\,...,t corresponding to the number 
of bits per sample in each link in each round. A coarser characterization would be in terms of the |£|-dimensional 
total-rate tuple iRjk){j,i<)e6^ where Rji^ is the total number of bits per sample transferred through link (j, k) in all 
the rounds. The coarsest characterization would be in terms of the sum-total-rate which is the sum of the total 
number of bits per sample in all the rounds through all the links. One can then define admissible rates, admissible 
total-rates, and the minimum sum-total-rate Rsum,t, following Definition |2] in terms of rates for which there exist 
encoding and decoding functions for which the block error probability of function computation goes to zero as 



November 12, 2008 



DRAFT 



18 



the blocklength goes to infinity. Let f denote the minimum number of rounds for which function computation is 
feasible. Computation is nontrivial if f* > 1. Clearly, f is not more than the diameter of the largest connected 
component of the network which is itself not more than (m - 1). Hence f < {m - 1). We will consider interaction 
to be useful if Rsmij < Rsnmr for some t > f . 

The search for an optimum interactive code is a twofold search over all interaction protocols and over all 
distributed source codes. The interaction protocol dictates which nodes transmit and which nodes receive messages 
in each round. The distributed source code dictates what information to send and how to decode it. In the two- 
terminal case, the standard machinery of random coding and binning is adequate to characterize the rate region and 
the minimum sum rate because it can be viewed as a sequence of Wyner-Ziv-like codes. In the multiterminal case, 
however, finding a computable characterization of the rate regions in terms of single-letter information measures 
can be challenging because the rate regions for even non-interactive special cases, such as the many-to-one, one-to- 
many, and relay configurations (see FigureO are longstanding open problems. For many of these configurations, the 
standard machinery of random coding and binning fall short of giving the optimal performance as exemplified by 
the Korner-Marton problem [10]. These difficulties notwithstanding, results for the two-terminal interactive function 
computation problem can be used to develop insightful performance bounds and architectural guidelines for the 
general multiterminal problems. This is discussed in the following two subsections. 

B. Cut-set bounds 

Given any f-round multiterminal interactive function computation problem, we can formulate a f-round two- 
terminal interactive function computation problem with concurrent messages by regarding a set of nodes .S c as 
one terminal and the complement S'^ as the other. The minimum sum-rate for this two-terminal problem is a lower 
bound for the minimum sum-total-rate between S and S'^ in the original multiterminal problem. 

Let Ra,b '■- l^ieA,keB,(i,k)eaRjk denote the sum-total-rate from a set of nodes A to a set of nodes B (over all rounds 
and over all available directed links from A to B). Let R^^f„ , denote the minimum sum-rate of the r-round two- 
terminal problem with concurrent messages with sources {Xj) j^s at A and (Xj)j^s' at B and functions {fj{X"'))jes and 
{fj{X'^))j^S' to be computed at A and B respectively. A systematic method for developing cut-set lower bounds for 
the minimum sum-total-rate of the f-round multiterminal problem is to formulate a linear program with {Rjk)(j,k)e£ 
as the variables and the sum-total-rate 2(/t)e£^jjt as the linear objective function to be minimized subject to the 
following hnear inequality constraints: c ^, R^^^ > H{{fj{X"'))jesAiXj)jes^), Rs',s > mfj(X'"))jes\(Xj)jes), 
{Rs,S' +Rs',s) ^ ^fumr and Rjk >Q,'ij + k. Note that the first two constraints respectively come from the first two 
terms on the right side of Corollary [T] (ii). Such cut-set bounds can often provide insights into when interaction 
may be useful and when it may not be (see examples below). 

C. Examples 

Example 1: Consider three nodes with sources {X\,X2) ~ DSBS(p), p € (0, 1), and X3 - 0. The functions desired 
at nodes 1, 2, and 3 are /i = 0, f2 - 0, and fi{xi,X2) = xi®X2 respectively. In other words, correlated sources Xj 
and X2 are available at nodes 1 and 2 respectively, and node 3 needs to compute the samplewise Boolean XOR 
function Xi ffiX2. Assume that this three-terminal network has a fully connected topology S. 

First consider the 1-round many-to-one interaction protocol given by £1 - {(1, 3), (2, 3)). Under this interaction 
protocol, the distributed function computation problem reduces to the Korner-Marton problem [10] and is illustrated 
in Figure Hla). The distributed source coding scheme of Korner and Marton based on binary linear codes (see [10]) 
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Fig. 4. (a) Many-to-one Korner-Marton scheme, (b) Relay scheme, (c) Geneiai interactive scheme. When (Xi,X2) ~ DSBS{p), p e (0, 1), all 
three schemes have the same mimmum sum-total-rate Ihiip). 



achieves the goal of computing the Boolean XOR at node 3 with Ri^i - R231 - R13 - R23 - HiXi 9X2) - h2{p). 
Hence the sum-total-rate of this non-interactive many-to-one coding scheme is given by R\3 +R23 = 2/z2(/?) bits per 
sample. Thus, in this example f* = 1 and the coding is non-interactive. 

Next, consider the 2-round relay-based interaction protocol given by £1 = 1(1,2)) and £2 = {(2,3)) as illustrated 
in Figure Ub). Consider the following coding strategy. Using Slepian-Wolf coding in the first round, with Ri2\ = 
^12 - H{Xi\X2) = h2(p), Xi can be reproduced at node 2. Then, Xj ffi X2 can be computed at node 2 and the 
result of the computation can be conveyed to node 3 in the second round by entropy-coding at the rate given by 
^232 = ^23 = H(Xi 9X2) = h2ip). Hence the sum-total-rate of this relay scheme is given by Rn +R23 - 2/z2(p) bits 
per sample. Since under this protocol information is constrained to flow in only one direction from source node 
1 to source node 2 in round one and then from node 2 to the destination node 3 in round two, distributed source 
codes which respect this protocol are, truly speaking, non-interactive. 

Finally, consider general f-round interactive codes. The cut-set lower bound between {1) and {2, 3) for computing 
Xi 9X2 at (2,3) gives R12 + R\3 > H{Xi 9X21X2) - h2{p). Interchanging the roles of nodes 1 and 2 in the 
previous cut-set bound we also have /?2i + ^23 ^ ^^('^i © ^2l'^i) = hiip)- Adding these two bounds gives 
^12 + ^13 + ^21 + ^23 ^ '^hiip)- Hence, Rsum,i ^ 2/22(7:'). This shows that the sum-total-rates of the many-to- 
one Korner-Marton and the relay schemes are optimum. No amount of interaction can reduce the sum-total-rate of 
these non-interactive schemes. 

Example 2: Consider three nodes with sources {Xi,X2) ~ DSBS(/7), p e (0, 1), and X3 - 0. The functions desired 
at nodes 1, 2, and 3 are /i = 0, /2 = 0, and f3(xi,X2) - x\ l\X2 respectively. In other words, correlated sources Xi 
and X2 are available at nodes 1 and 2 respectively, and node 3 needs to compute the samplewise Boolean AND 
function instead of the XOR function in Example 1. As in Example 1, assume that this three-terminal network has 
a fully connected topology £. 

Consider a general f-round interactive code with the following interaction protocol: for all / = 1 , . . . , f, £, = 
{(1,3), (3, 1), (2, 3), (3,2)) (see Figure [Sja)). Note that nodes 1 and 2 cannot directly communicate with each other 
under this interaction protocol. Due to Theorem |2] the cut-set lower bound between {!) and {2,3} for computing 
Xi AX2 at {2,3} is given by: Rn + R31 > HiXi\X2) = h2{p). Similarly, we have R23 + R32 > H{X2\Xi) = h2ip). 
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Fig. 5. (a) Interactive Many-to-one scheme, (b) Relay scheme. When iXi,X2) ~ DSBSip) and p e (1/3, 1), the mimmum sum-total-rate for (b) is 
less than that for (a). 



Adding these two bounds gives R13 + /?3i + /?23 + ^^32 ^ 2h2(p). It should be clear that f* = 1 because nodes 1 and 
2 can send all their source samples to node 3 in one round. If there is only one round, there is no advantage to be 
gained by transferring messages between nodes 1 and 2. This observation, together with the above cut-set bound 
shows that Rsum,,- > ^Inip). 

Now consider the 2-round relay scheme illustrated in Figure |5lb). Using Slepian-Wolf coding in the first round, 
with Ri2\ - R\2 - H{Xi\X2) - /J2(p), Xi can be reproduced at node 2. Then, Xi A X2 can be computed at 
node 2 and the result of the computation can be conveyed to node 3 in the second round by entropy-coding at 
the rate given by R232 = ^23 - H{Xi A X2) = /22(^). Hence the sum-total-rate of this relay scheme is given 
by R12 + R23 = h2(p) + /22(^) bits per sample, which is less than 2h2(p) when p > 1/3. Thus, for p > 1/3, 
Rsum.2 < Rsum,!' and interaction is useful]^ In fact, when p > 1/3, a single message from node 1 to node 2 is more 
beneficial in terms of the sum-total-rate than multiple rounds of two-way communication between nodes 1 and 3 
and between nodes 2 and 3. 

Example 3: Consider m > 3 nodes and m independent sources Xi, . . . ,X,„ each of which is iid Ber{l/2). For each 
the i-th source X, is observed at only the i-th node. Only node 1 needs to compute the function /i(x'") - minj^^ixj). 
Assume that the network has a star topology with node 1 as the central node as illustrated in Figure |6] Specifically, 
let£ = {0-,l),(l,;))™2. 

Consider non-interactive coding schemes in which information is constrained to flow in only one direction from 
the leaf nodes to the central node as illustrated in Figure |6ja). Specifically, the interaction protocol is given by 
&i = {{j, 1))"L2 ^'^^ each i - I, . . .,t. Since information flows in only one direction from the leaf nodes to the central 
node, there is no loss of generality in assuming that t - I. For each j -2, . . ., m, let us compute the cut-set bound 
RtZt with S = 0) and f = 1. Using Lemma |2l we obtain Rji > HiXj\Xu ■ ■ ■ ,Xj^uXj+i, ■ ■ ■ ,X,„) = HiXj) = 1. 
Therefore, Rsmi.i ^ (m- 1). Since this is achievable by transferring all the data to node 1, Rsum,i - ('«- 1) = ©(m). 
Thus, in this example, f* = 1. 

'Truly speaking, this coding scheme is non-interactive because information flows in only one direction from node 1 to node 2 and then from 
node 2 to node 3. 
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Fig. 6. (a) Non-interactive function computation, (b) Interactive function computation. When (Xi, . . . ,X,„) ~ iid Ber(l/2), the minimum sum- 
total-rate for (b) is orderwise smaller than that for (a). 



Now consider the following (2m-2)-round interactive coding scheme in which information flows in both directions 
from the leaf nodes to the central node and back as illustrated in Figure|6tb). In round number (2/- 1), where ; ranges 
through integers from 1 through (m - 1), node 1 sends the sequence (mm'j^^(Xj(k))^^^ to node (/ + 1) at the rate 
^i(!+i)(2/-i) - H{min'i^^{Xi)) = /i2(l/2') bits per sample. Node (; + 1) then computes the sequence (min'^^\(Xy(^)))^^ 
and sends it back to node 1 in round number 2i, using Slepian-Wolf coding (or conditional coding) with the previous 
message as correlated side information available to the decoder (and the encoder). This can be done at the rate 
given by /?(,+i)i(2i) = H{nnn'^J^(X j)\ nnn'j^^{Xj)) - 1/2' bits per sample. It can be verified that the message sequence 
in round number (2m - 2) is the desired function. 

The sum-total-rate of this scheme is given by 

where the first inequality is because hiip) ^ P log2(e/ p). Thus for all m > 6, Rmm,(im-2) < 3 H- log2 e < 5 < (m - 1) = 
Rsum,t', showing that interaction is useful. In fact, the minimum sum-total-rate Rsum,(2m-2) is 0{\) with respect to 
the number of nodes m in the network. This is orderwise smaller than 0(m) for any 1 -round non-interactive coding 
schemed 

The above examples can be interpreted in two ways. From the perspective of protocol design, these examples 
show that for a given topology, certain information-routing configurations are fundamentally more efficient than 
certain others for function computation. From the perspective of network architecture, these examples show that 
certain topologies are fundamentally more efficient than certain others for function computation. The last example 
shows that the scaling laws governing the information transport efficiency in large networks can be dramatically 
different depending on whether the information transport is interactive or non-interactive. 

VI. Concluding remarks 

In this paper, we studied the two-terminal interactive function computation problem within a distributed source 
coding framework and demonstrated that the benefit of interaction depends on both the function-structure and the 

*Note the following: a) In studying how the minimum sum-total-rate scales with network size, the coding blocklength is out of the picture 
because it has already been "sent to infinity", b) Even though H(m\n!"_^(Xj)) — » as m — > oo, we cannot have nodes send nothing (; = 0) and 
set the output of node 1 to be identically zero. This is because then the probability of block en'or will be equal to one. 
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distribution-Structure. We formulated a multiterminal interactive function computation problem and demonstrated that 
interaction can change the scaling law of communication efficiency in large networks. There are several directions for 
future work. In two-terminal interactive function computation, a computable characterization of the infinite-message 
minimum sum-rate is still open. The achievable infinite-message sum-rate of Section lTV-Fl involving definite integrals 
and a rate -allocation curve appears to be a promising approach. We have obtained only a partial characterization of 
the structure of functions and distributions for which interaction is not beneficial. An interesting direction would be 
to find necessary and sufficient conditions under which interaction is useful. The multiterminal interactive function 
computation problem is wide open. A promising direction would be to study how the total network rate scales with 
network size and understand how it is related to the network topology, the function structure, and the distribution- 
structure. 



If a rate tuple R = (Ri,...,R,) is admissible for the f-message interactive function computation with initial 
location A, then Ve > 0, there exists N(e, t), such that V« > N{e, t) there exists an interactive distributed source 
code with initial location A and parameters (f, «, \M\\, . . . , \Mt\) satisfying 



'-\og,\Mj\<Rj + 6, ; = l,...,r, 
P(Za + %) < e, + %) < e. 

Define auxiliary random variables V/ = !,...,«, Ui{i) :- {Mi,X(i-), Y{i+)], and for j - 2, . . . ,t, Uj: - Mj. 



Appendix I 



Theorem [T] CONVERSE proof 



Information inequalities: For the first rate, we have 



n{Ry + e) 



> //(Ml) 



> //(Mi|Y) 



> /(Mi;X|Y) 



= //(X|Y)-//(X|Mi,Y) 



n 



= 2 H{X(i)\Y(i)) - H{X(i)\X(i-), Ml, Y) 



n 



> Yj H(X{i)\Y{i)) - H{X(i)\X(i-), Ml, Y(i), Y{i+)) 



n 



= 2 /(X(0; MuX{i-), Y{i+)\Y{i)) 




(1.1) 
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For an odd j > 2, we have 

n{Rj + e) 

> H(Mj) 

> H{Mj\Mj-\Y) 

> I(Mj;X\M^-\Y) 

= H(X\Mj-\Y) - H{X\Mj, Y) 

n 

= ^ H(X(i)\Xii-), Mj-\ Y) - H(Xii)\Xii-), mK Y) 



/■=i 

n 

(«) 

1=1 



^ H(X{i)m-\ Mi-\ F(0, !-(/+)) 

1=1 

M', Y) 

n 

> Yj H{X(i)\X{i-), Mj-\ Y{i), Y{i+)) 
1=1 

-H{X{i)\X{i-), Mj, Y{i), Y(i+)) 

n 

= ^ /(X(0; MylM^-' , }-(/+), Y(i)) 

1=1 

11 

= 2 /(X(0;f/;|t/i(0,t/r''i'(0). (1-2) 
1=1 

Step (a) is because the Markov chain X{i) - {Mj'\X(i-), Y(i), Y{i+)) - Y{i-) holds for each i - 
Similarly, for an even j > 2, we have 

n(Rj + e) 

> H(Mj) 

> I(Mj;Y\Mj-\X) 

= H(Y\Mj-\X) - H(Y\M', X) 

n 

= ^ H{Y{i)\Y{i+), M'-\ X) - H{Y{i)\Y{i+), M\ X) 



1=1 

n 
1=1 



^ H(Ym(.i+\ M'-\ X{i}, X{i-)) 

■=i 

-H{Y{i)\Y{i+),MK^) 

n 

> H{Y{i)\Y{i+), M'-\ X{i\ X{i-)) 
1=1 

-H(Y(i)\Y(i+), mK X(i), Xii-)) 

n 

= Y t(Y(iy, Mj\Mj-\ X{i-), Y{i+), X(i)) 

1=1 

17 

= ^ 7(7(0; C/,|f/i(0,f/r'.^(0). (1.3) 

1=1 

Step (b) is because the Markov chain Y{i) - {M''\X(i-),X(i), Y{i+))-X(i+) holds for each i- l,...,n. 
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By the condition P(Z4 + Z^) < e and Fano's inequality [5], 

/j2(e) + elog2(|Z«|-l) 

n 

Y^H{ZA{i)\ZAii+),M',X) 



> 

1=1 



> HiZAii)\ZA(i+X Y(i+), M\ X) 
1=1 

n 

Y^H{ZA{i)\Y{i+),M',X) 
1=1 
11 

2 //(z^(/)|y(/+), M', xa-), xn)) 

n 

= Y,mA{i)\U,{i),V^,X{i)). (1.4) 
1=1 

Step (c) is because for each Za(0 = fA(X(i), Y{i)). Step (d) is because the Markov chain Z4(/) - (X(0, Y{i)) - 
(M' , X(i-), X(i), Y(i+)) - X(i+) holds for each Similarly we also have 

n 

h2(e) + flogjdZ" I - 1) > ^ H{ZB(i)\Ui{i), ^, ^0)- (1-5) 

i=i 

Timesharing: Then we introduce a timesharing random variable 2 taking value in {1, . . . , «} equally likely, which is 
independent of all the other random variables. Defining Ui := (UiiQ), Q),X := XiQ), Y := Y{Q),Za := Za(0,Zb := 
Zb{Q), we can continue (II. lb as 

Ri+e > -yi{X(i);Ui(i)\Y(i)) 

1=1 

= i{X{Q)-um\Y{Q),Q) 
i{X{Q)-um,Q\Y{Q)) 

= /(X;C/i|y), (1.6) 

where step (e) is because Q is independent of all the other random variables and the joint pmf of {X{Q), Y{Q)) ~ pxr 
does not depend on Q. Similarly, (II.2l i and (II.3l l become 

' I{X;Uj\Y,Ui-'^), ;>2,;odd, 
I{Y; Uj\X, t/>-'), j > 2, even. 



/e; + e><^ " ' . , ' " ' (1.7) 



(|L4l i and (|T3J become 



i/z2(e) + elog2lZAl > H{Za\U\X), (1.8) 
n 

-/12(f) + elog2|ZB| > H(Zb\U\Y). (1.9) 
n 



Concerning the Markov chains, we can verify that U\(i)-X{i)-Y{i) holds for each i- !,...,«,=> I{Ui{Q)\ Y(Q)\X{Q), Q) - 
^ /(t/i(0, 2; Y{Q)\X{Q)) = ^ /(f/i; FIX) = 0. For each odd j > 2, we can verify that Uj-{X{i), Ui{i), t/f')- 
7(0 holds for each /, ^ liUf, Y{Q)\X{Q), Ui{Q), U'{\ G) = => I{U f, Y\X, U'-^) = 0. Similarly, we can prove the 
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Markov chains for even /s. So we have 

I{Uj\Y\X,Ui-^)^Q, /odd, 
= 0, y even. 

Cardinality bounds: The cardinaUties I'ZYyl, y = 1, . . . , f, can be bounded as in ( 13.2b by counting the constraints that 
the [//s need to satisfy and applying the Caratheodory theorem recursively as explained below (also see [13]). Let 
U' be a given set of random variables satisfying ( 11.6b to (II. 10b . If \'Uj\, j - 1, . . . , f are larger than the alphabet sizes 
given by (13.2b . it is possible to derive an alternative set of random variables satisfying (13.2b while preserving the 
values on the right side of (11.6b to ( |1.9b fixed by the given U' as well as all the Markov chains ( II. 10b satisfied by the 
given U' . The derivation of an alternative set of random variables from U' has a recursive structure. Suppose that 
for i - 1, . . . , (A: - 1), alternative U j have been derived satisfying ( 13. 2b without changing the right sides of ( |1.6b to 
(II. 9b and without violating the Markov chain constraints ( II. 10b . We focus on deriving an alternative random variable 
Uk from Uk- We illustrate the derivation for only an odd-valued k. The joint pmf of {X, Y, U') can be factorized as 

PXYU' - PUtPxU':-'\Ui,PY\XUi'-'Pu'^^j\XYU': (I-H) 

due to the Markov chain Uk - (X, f/*^ ') - Y. It should be noted that Z4 and Zg being deterministic functions of 
(X, Y) are conditionally independent of U' given {X, Y). The main idea is to alter pu^. to pjj^ keeping fixed all the 
other factors on the right side of (II. lib . We alter to p^j^ in manner which leaves pxYV^-' unchanged while 
simultaneously preserving the right sides of (11.6b to (11.9b . Leaving pxYU'^-' unchanged ensures that the Markov chain 
constraints ( II. 10b continue to hold for t/* Fixing all the factors in ( 11.1 lb except the first ensures that the Markov 
chain constraints (II. 10b continue to hold for (Uk, U'j^^^). To keep pxYW^-' unchanged, it is sufficient to keep pxu''-' 
unchanged because Py\xui:-' is kept fixed in dl.l lb . Keeping pxut-' and Pxv''-^Vk fixed while altering /jy^ requires 
that 

PXUt-^{x,u'''^) = ^/5(/i.(Mi)/?xj/'-l|[/i-(jC,M*"'|Mi) (1-12) 

hold all tuples (jc,m*-1). This leads to n^ll I'^^jl - l) 

linear constraints on (the minus one is because 
2j.„a-] p^jff/i-i (x, m*^"') - 1). With pxYU''-^ unchanged, the right sides of ( 11.6b and (11.7b for j - 1,...,(A:- 1) also 
remain unchanged. For j - k, k odd, the right side of (11.7b can be written as follows 

ik = H(X\Y, U^-') - 2 PuM)H{X\Y, u''-\ Uk = Uk). (1.13) 
«t 

The quantity ik is equal to the value of I(X\ Uk\Y, f/*^ ') evaluated for the original set of random variables U' which 
did not satisfy the cardinality bounds (Il2] i. The quantities //(X|y, t/*"'), and //(X|F, f/*"', f/^ = ma) in (IDjI i are 
held fixed because Pxyu'^-' is kept unchanged and all factors except the first in (II. lib are fixed. In a similar manner, 
for each j > k, j odd, the right side of (11.7b can be written as follows 

ij = PuMWX; Uj\Y, t/*-', t//^j, Uk = Uk), (1.14) 

"i- 

where ij is equal to the value of I(X; Uj\Y, U-'^^) evaluated for the original U' and I(X; Uj\Y, f/*"', u(^^, Uk - Uk) 
is held fixed for all j > k, j odd, because all factors except the first in (II. lib are fixed. Again, for each j > k, j 
even, the right side of ( 11.71 ) can be written as follows 

ij = Yj PuMm Uj\X, U'-\ Ul^, Uk = Uk), (1.15) 
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where ij is equal to the value of I(Y\ Uj\X, evaluated for the original U' and I{Y; Uj\X, U'''\ ul^^,Uk - Uk) 
is held fixed for all j > k, j even, because all factors except the first in jl.llb are fixed. The right sides of (II. 8b and 
(II. 9b respectively can also be written as follows 

hA^Yj PuM)H{Za\X, U^-\ U[^„ Uk = Uk), (1.16) 
hB = puMk)H{ZA\X, U'-K Uli, Uk = Uk), (1.17) 

Uk 

where and lig are respectively equal to the values of H{Za\X, U') and H{Zb\Y, U') evaluated for the original U' and 
H{Za\X, f/*"', t/[_|_[, Uk = Uk) and H(Za\X, f/*"', U[^^, Uk - ma) are held fixed because Za and Zb are deterministic 
functions of (X, Y) and all factors except the first in ( II. lib are fixed. 

Equations (II. 13b through (II. 17b impose {t-k+3) linear constraints on pf/^. When the linear constraints imposed by 
(II. 12b are accounted for, altogether there are no more than (|^| n^=i \^j\ + t - k + 2^ linear constraints on pu,,. The 
vector {{pxu''-' u''^^)}, k, ■ ■ ■ , h, hA, hs) belongs to the convex hull of \1{k\ vectors whose Y\''jZ\ \1^j\ + t - k + 2^ 
components are given by {/?xj/<-i|j/j(Ji:, m*"'|«a)), H{X\YU'''\Uk = Uk), {IiX;Uj\Y,U'''\Uj^^^,Uk = Uk)] j>kj:odd, 
{I{Y; Uj\X, t/*-i, Ui^, Uk = Uk)}j>k.j:cvcr„ H(Za\X, U'-\ U[^^,Uk = «*), H{Za\X, U'-', U^^, Uk = Uk). By the Caratheodory 
theorem, pu^, can be replaced by pjj^ such that the new random variable Uk e ^k where 'Z/a c I/jt, contains only 

nj=i + t - k + 3^ elements, while (II. 10b and the right sides of ( lL6b to ( |L9l ) remain unchanged. 
Taking limits: Thus far, we have shown that Ve > and V« > N{e, t), 3 punxYiu'\x,y, e, n) such that U' satisfy ( 13.2b 
and ( II.6b to (II. 10b . It should be noted that pu<\xY{u'\x,y, e,n) may depend on (6, n), whereas Vj = is 
finite and independent of (e, «). Therefore, for each (eo,«o), Pu'\XY{u'\x,y, eo,no) is a finite dimensional stochastic 
matrix taking values in a compact set. Let {e/} be any sequence of real numbers such that e/ > and e; — » 
as / — > oo. Let {«/} be any sequence of blocklengths such that «/ > N{ei,t). Since pu'\XY lives in a compact set, 
there exists a subsequence of {pu'\XY{u'\x,y, e/, «/)) converging to a limit pQt^xYi'^'l^'y)- Denote the auxiliary random 
variables derived from the limit pmf by U'. Due to the continuity of conditional mutual information and conditional 
entropy measures, ( II.6b to dl.lOb become 

i I{X; Uj\Y, i7>-i), liUf, Y\X, Uj-^) = 0, j odd, 
■' " I I(Y; Uj\X, UJ-^), I(Uj;X\Y, U^-^) = 0, ; even, 

H{Za\U',X) = 0, H{Zb\U', Y) = 0. 

Therefore R belongs to right side of ( 13.1b . ■ 
Remarks: The convexity of the theoretical characterization of the rate region can be established in a manner similar 
to the timesharing argument in the above proof. The closedness of the region can also be shown established in a 
manner similar to the limit argument in the last paragraph of the above proof using the following facts: (i) All the 
alphabets are finite, thus pu'\XY takes values in a compact set. Therefore the limit point of a sequence of conditional 
probabilities exists, (ii) Conditional mutual information measures are continuous with respect to the probabiUty 
distributions. 

Appendix II 
Proofs of Theorems |2] and [3] 

Proof of Theorem^ We need to show that 7;f„„,, > H{X\Y) only for p,q& (0, 1). If p,q e (0, 1) then pxY(x,y) > 
0, V(x, y) e X X }/. Let U' be any set of auxiliary random variables in ( 13.31) satisfying all the Markov chain and 
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conditional entropy constraints of ( 13.11 ). Due to Lemma [Hi), for any u', J{{u') is a rectangle of A" x J/. Due to 
Lemma [TJiii) and the assumption that /B(0,yo) /bCIj^o), ^{u') cannot be Xxif. Therefore Ji{u') could be a row 
of A' X J/, a column, a singleton, or the empty set. Let 

0, if J?1(m') is empty 
(j){u') := i 1, if Ji{u') is a row of A" x J/ 
2, otherwise. 

Now, pu'{u') = o ^[(m') is empty. Therefore 

is empty 

Hence /7zi'(*(U')(-^'3'' 0) = for all x and y. By the definition of a row of A^ x J/, we have H(X\U' , (f>(U') = 1) = 0, 
which impHes that H(X\Y, U',(piU') = 1) = 0. Similarly, we have H{Y\X, U',(p{U') = 2) = 0. Loosely speaking, this 
means that knowing the auxiliary random variables U' - u' (representing the messages in the proof of achievability), 
there are only two possible alternatives, (1) H{Y\X, U' = u') = 0, that is, Y can be reproduced at location A; (2) 
H{X\Y, U' - u') = 0, X can be reproduced at location B. Thus interestingly, although the goal was to only compute 
a function of sources at location B, after f messages have been communicated, each location can, in fact, reproduce 
a part of the source from the other location. In the case where X is not known at location B, Y must be known at 
location A. 



2 pu'(u') = 0. 



To continue the proof, for any t e ! 



(a) 



> 

(b) 



min[/(X; U'\Y) + I{Y; U'\X)] 

mm[H{X\Y) - HiX\Y, U', ^{U*)) 

+H{Y\X) - H(Y\X, U', (f>{U'))] 

mm[H(X\Y) - H{X\Y U',^{U') = 2)p^(f/,)(2) 

+H(Y\X) - H(Y\X, U', <p{U') = 1)PMV){1) 

vam[H{X\Y) - H{Y ® W\Y U', <p{U') = 2)p^(c,)(2) 

+H{Y\X) - H{X ® W\X, U', 4>(V) = 

mm[H{X\Y) - H{W\4,(U') = 2)p^(u'){2) 

+H{Y\X) - H{W\cl>(U') = l)p^(u')(l) 

mm{H{X\Y) + H(Y\X) - H(W\4>{U'))] 

H{X\Y) + H{Y\X) - H{W) 

H{X\Y), 



where all the minimizations above are subject to all the Markov chain and conditional entropy constraints in (13. 11 1. 
In step (a) we used the conditions H{X\Y, U', cl){U') = 1) = and H{Y\X, U', 4>{U') = 2) = and in step {b) we used 
the fact that H{Y\X) = H{W) = Inip). ■ 
Proof of Theorem \3\ This follows immediately by examining the proof of Theorem |2] and making the following 
observations. Observe that J?I(m') can be only a subset of a row or a column. This follows from the first assumption 
in the statement of the theorem that these are the only column-wise /^-monochromatic rectangles of A' x J/. Next 
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observe that if 

0, if J?1(m') is empty 
<f>(u'):^\ 1, if^(«') is a subset of a row of X x if 
2, otherwise, 

then H{X\Y, U',<p(U') = 1) = and H{Y\X, U',(f>(U') = 2) = as in the proof of Theorem |2l Finally observe that 
the the series of information-inequalities in previous proof will continue to hold if X®W and F ffi W are replaced 
by i^{X, W) and t]{Y, W) respectively. This is due to the second assumption in the statement of the theorem which 
also states that HiY\X) = HiW). ■ 



Appendix III 
Theorem |4] proof 

Since < p,q < 1, pxY(x,y) > 0,V(x,y) e <Y x J/. Let U' be any set of auxihary random variables in ( I3.3l l 
satisfying all the Markov chain and conditional entropy constraints of ( 13.11 1. Due to Lemmaflji) and (iv), for any u', 
J[{u') is a /a -monochromatic rectangle of <Y x J/. Since fA{x,y) - x ^y, ^(u') can be {(0, 0), (0, 1)), {(0, 0), (1,0)), 
any singleton {{x,y)], or the empty set. Let 

0, if ^(m') is empty 

1, if ^(m') = 1(1,1)) 

2, if Jl{u') 3 (1,0) 

3, otherwise. 



0(«') 



Since, pu'(u') = o J?((m') is empty, p^(u'){0) = 0. Therefore pxy0((/')(-*^' J' 0) = for all x and y. When X - Y - 0, 
(f>{U') can be only 2 or 3, that is, pxY,i,(U'){0, 0, 0) - pxY<p(U')(^' 0, 1) = 0. The condition pxY<(i(U')(0, 0, 0) = is obvious. 
To see why pxY4,{U')iO,0, 1) = is true, note that (f>{u') - 1 if, and only if, ^{u') = {(1, 1)), which implies that 
PxYU'i^' 0, u') - because Jl{u') is the set of all {x,y) for which pxYwi^^y^ «') > and (0, 0) is not in it. Therefore, 

PXYm'){0, 0, 1) = PXYU'iO, 0, «') = 0. 

»':.■?!(»')=(( 1,1)) 

Reasoning in a similar fashion, we can summarize the relationship between X, Y, and (piU') as shown in Table I] For 
each value of {x,y), the values of 0(t/') shown in the table are those values for which pxY^(U') is possibly nonzero, 
that is, for all values of <p different from those shown in the table, the value of pxY<i>(U') is zero. For example, the 
entry "X = 0, F = 0, (p{U') = 2 or 3" means that for / 2, 3, pxY,p(V)i!^, 0, /) = 0. 

TABLE I 

Relation between X Y and 0(f/') 





F = () 


Y = 1 


X = 


(l>(U') = 2 or 3 


'PiU') = 3 


X= 1 


^{U') = 2 


<t>(U') = 1 
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Let A := P4,(U')\x,y{2\0,0), we have 

P^(V)i2) = pxYihO) + ApxY(0,0), 
P^(u-)0) = Pxy(0,l) + (1-^W0,0). 



For any t e '. 



sumj 



= min[/(X; f/'|y) + I{Y; U'\X)] 
= min[/(X; (t>(U')\Y) + /(F; f/', 0(t/')l^)] 
> min[/(X; 0( t/') | F) + /( Y; 4>( U') \X)] 
= min[HiX\Y) + H{Y\X) - H(X\Y, - H(Y\X, (f>{U'))] 

= min [ h2{p) + /!2(?) - mX\Y = 0, 0(f/') = 2)p^(f;,)(2) 
-//(}'|X = 0,,^(t/') = 3)/7^(f/,)(3)] 

/!2(/J) + hiiq) - hi I |p<f(c/')(2) 



mm 

0<.)<1 



~'^2 TTT f 0(C/')(3) 

\/'0((/')(3)/ 

where all the minimizations above except the last one are subject to all the Markov chain and conditional entropy 
constraints in ( 13. It . The last expression is minimized when - q{l - p)/ip + q - ^pq). Evaluating the minimum 
value of the objective function, we have 

Rsum,oc = hm Ri^ , > h2{p) + h2{q) - (1 - pq)h2 (I^Jl}^ 

\ 1 - pq 
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